Search CORE

13 research outputs found

Recommended from our members

O(N)-Space Spatiotemporal Filter for Reducing Noise in Neuromorphic Vision Sensors

Author: Kastner Ryan
Khodamoradi Alireza
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Neuromorphic vision sensors are an emerging technology inspired by how retina processing images. A neuromorphic vision sensor only reports when a pixel value changes rather than continuously outputting the value every frame as is done in an 'ordinary' Active Pixel Sensor (ASP). This move from a continuously sampled system to an asynchronous event driven one effectively allows for much faster sampling rates; it also fundamentally changes the sensor interface. In particular, these sensors are highly sensitive to noise, as any additional event reduces the bandwidth, and thus effectively lowers the sampling rate. In this work we introduce a novel spatiotemporal filter with O(N)O(N) memory complexity for reducing background activity noise in neuromorphic vision sensors. Our design consumes 10× less memory and has 100× reduction in error compared to previous designs. Our filter is also capable of recovering real events and can pass up to 180 percent more real events

eScholarship - University of California

A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers

Author: Alser Mohammed
Cavlak Meryem Banu
Corporaal Henk
Denolf Kristof
Firtina Can
Khodamoradi Alireza
Mutlu Onur
Singh Gagandeep
Publication venue
Publication date: 14/04/2023
Field of study

Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models to perform basecalling without considering the compute demands of such models, which leads to slow, inefficient, and memory-hungry basecallers. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. Our goal is to develop a comprehensive framework for creating deep learning-based basecallers that provide high efficiency and performance. We introduce RUBICON, a framework to develop hardware-optimized basecallers. RUBICON consists of two novel machine-learning techniques that are specifically designed for basecalling. First, we introduce the first quantization-aware basecalling neural architecture search (QABAS) framework to specialize the basecalling neural network architecture for a given hardware acceleration platform while jointly exploring and finding the best bit-width precision for each neural network layer. Second, we develop SkipClip, the first technique to remove the skip connections present in modern basecallers to greatly reduce resource and storage requirements without any loss in basecalling accuracy. We demonstrate the benefits of RUBICON by developing RUBICALL, the first hardware-optimized basecaller that performs fast and accurate basecalling. Compared to the fastest state-of-the-art basecaller, RUBICALL provides a 3.96x speedup with 2.97% higher accuracy. We show that RUBICON helps researchers develop hardware-optimized basecallers that are superior to expert-designed models

arXiv.org e-Print Archive

Tailor: Altering Skip Connections for Resource-Efficient Inference

Author: Denolf Kristof
Duarte Javier Mauricio
Kastner Ryan
Khodamoradi Alireza
Koushanfar Farinaz
Loncar Vladimir
Marcano Gabriel
Meza Andres
Sheybani Nojan
Weng Olivia
Publication venue
Publication date: 15/09/2023
Field of study

Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network's skip connections to lower their hardware cost. Tailor improves resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for on-chip, dataflow-style architectures. Tailor increases performance by 30% and reduces memory bandwidth by 45% for a 2D processing element array architecture

arXiv.org e-Print Archive

Microscaling Data Formats for Deep Learning

Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe

arXiv.org e-Print Archive

$O(N)$ O(N)-Space Spatiotemporal Filter for Reducing Noise in Neuromorphic Vision Sensors

Author: Khodamoradi Alireza,
Publication venue
Publication date: 04/02/2020
Field of study

Ezid

Recommended from our members

Reshaping Deep Neural Networks for Efficient Hardware Inference

Author: Khodamoradi Alireza
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

The latest Deep Learning (DL) methods for designing Deep Neural Networks (DNN) have significantly expanded our ability to train data processing systems. Coupled with exponential growth in available digital data, we have seen dramatic accuracy improvements in DNNs and widespread adoption of these models in different applications.This increased demand has motivated innovations in DNN architecture design to deliver high-quality output. For example, advanced DL models can include irregular connections between their layers, have more parameters, and employ computationally complex neurons. Unfortunately, these new architectural additions often increase the implementation complexity of the DNNs on hardware, particularly when deploying DL models for inference in scale-out and power-limited systems.Currently, to deploy a DNN on a custom platform, an abstract of the DL model is used to create a functionally identical realization. However, because altering this abstract changes the functionality of the DL model, hardware designers keep the model unchanged for a lossless implementation.This thesis shows that a co-design approach can improve the hardware implementation of DL models. In a co-design approach, the designer reshapes the DNN architecture to better fit a target processing platform and preserves its accuracy by retraining the model.We describe a custom accelerator for Spiking Neural Networks (SNN) with improved computational cost and memory utilization because of reshaping the layers and neurons of the model. We then apply these changes to the existing SNN models and show that they can maintain their accuracy after the reshaping and retraining. In addition, we introduce novel applications for SNNs based on the new architecture. We also present a stochastic noise filter for pre-processing SSN's input with improved accuracy and memory utilization. Furthermore, we explain a reshaping method for Residual Networks (ResNet) to reduce their memory footprint while preserving their accuracy. This thesis also introduces a method for accelerating the co-design process. Reshaping DL models can increase the complexity of their training stage. We present an auto tuner for the learning rate (an essential parameter for training DNNs) that simplifies the manual tuning for this parameter and can accelerate the retraining of DL models

eScholarship - University of California

Benchmarking vision kernels and neural network inference accelerators on embedded platforms

Author: Blott Michaela
Denolf Kristof
Halder Lisa
Jones Phillip
Khodamoradi Alireza
Lo Jack
Qasaimeh Murad
Vissers Kees
Zambreno Joseph
Zambreno Joseph
Publication venue
Publication date: 25/09/2020
Field of study

Developing efficient embedded vision applications requires exploring various algorithmic optimization trade-offs and a broad spectrum of hardware architecture choices. This makes navigating the solution space and finding the design points with optimal performance trade-offs a challenge for developers. To help provide a fair baseline comparison, we conducted comprehensive benchmarks of accuracy, run-time, and energy efficiency of a wide range of vision kernels and neural networks on multiple embedded platforms: ARM57 CPU, Nvidia Jetson TX2 GPU and Xilinx ZCU102 FPGA. Each platform utilizes their optimized libraries for vision kernels (OpenCV, VisionWorks and xfOpenCV) and neural networks (OpenCV DNN, TensorRT and Xilinx DPU). For vision kernels, our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2 compared to the others for simple kernels. However, for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3. For neural networks [Inception-v2 and ResNet-50, ResNet-18, Mobilenet-v2 and SqueezeNet], it shows that the FPGA achieves a speed up of [2.5, 2.1, 2.6, 2.9 and 2.5] and an EDP reduction ratio of [1.5, 1.1, 1.4, 2.4 and 1.7] compared to the GPU FP16 implementations, respectively.This is a manuscript of an article published as Qasaimeh, Murad, Kristof Denolf, Alireza Khodamoradi, Michaela Blott, Jack Lo, Lisa Halder, Kees Vissers, Joseph Zambreno, and Phillip H. Jones. "Benchmarking vision kernels and neural network inference accelerators on embedded platforms." Journal of Systems Architecture (2020): 101896. DOI: 10.1016/j.sysarc.2020.101896. Posted with permission.</p

Digital Repository @ Iowa State University (ISU)

Recommended from our members

Tailor: Altering Skip Connections for Resource-Efficient Inference

Author: Denolf Kristof
Duarte Javier Mauricio
G Abarajithan
Kastner Ryan
Khodamoradi Alireza
Koushanfar Farinaz
Loncar Vladimir
Marcano Gabriel
Meza Andres
Sheybani Nojan
Weng Olivia
Publication venue: eScholarship, University of California
Publication date: 13/10/2023
Field of study

Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network’s skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce Tailor , a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network’s skip connections to lower their hardware cost. Tailor improves resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for on-chip, dataflow-style architectures. Tailor increases performance by 30% and reduces memory bandwidth by 45% for a 2D processing element array architecture

eScholarship - University of California

Epidemiological and clinical features of 2019 novel coronavirus diseases (COVID-19) in the South of Iran

Author: Dorrani Nejad Abolfazl
Ebrahimi Mostafa
Emami Yasaman
Erfani Amirhossein
Hemmati Abdolrasool
Hosseinpour Hamidreza
Khodamoradi Zohre
Lotfi Mehrzad
Mirahmadizadeh Alireza
moghadami mohsen
Ranjbar Keivan
Shahriarirad Reza
Shirazi Yeganeh Babak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

BACKGROUND: In March 2020, the WHO declared the novel coronavirus (COVID-19) outbreak a global pandemic. Although the number of infected cases is increasing, information about its clinical characteristics in the Middle East, especially in Iran, a country which is considered to be one of the most important focal points of the disease in the world, is lacking. To date, there is no available literature on the clinical data on COVID-19 patients in Iran. METHODS: In this multicenter retrospective study, 113 hospitalized confirmed cases of COVID-19 admitted to university affiliated hospitals in Shiraz, Iran from February 20 to March 20 were entered in the study. RESULTS: The mean age was 53.75 years and 71 (62.8%) were males. The most common symptoms at onset were fatigue (75: 66.4%), cough (73: 64.6%), and fever (67: 59.3%). Laboratory data revealed significant correlation between lymphocyte count (P value = 0.003), partial thromboplastin time (P value = 0.000), international normalized ratio (P value = 0.000) with the severity of the disease. The most common abnormality in chest CT scans was ground-glass opacity (77: 93.9%), followed by consolidation (48: 58.5%). Our results revealed an overall 8% (9 out of 113 cases) mortality rate among patients, in which the majority was among patients admitted to the ICU (5: 55.6%). CONCLUSION: Evaluating the clinical data of COVID-19 patients and finding the source of infection and studying the behavior of the disease is crucial for understanding the pandemic

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)